Prognostic-related genes for pancreatic cancer typing and immunotherapy response prediction based on single-cell sequencing data and bulk sequencing data

Background Pancreatic cancer is associated with high mortality and is one of the most aggressive of malignancies, but studies have not fully evaluated its molecular subtypes, prognosis and response to immunotherapy of different subtypes. The purpose of this study was to explore the molecular subtypes and the key genes associated with the prognosis of pancreas cancer patients and study the clinical phenotype, prognosis and response to immunotherapy using single-cell seq data and bulk RNA seq data, and data retrieved from GEO and TCGA databases. Methods Single-cell seq data and bioinformatics methods were used in this study. Pancreatic cancer data were retrieved from GEO and TCGA databases, the molecular subtypes of pancreatic cancer were determined using the six cGAS-STING related pathways, and the clinical phenotype, mutation, immunological characteristics and pathways related to pancreatic cancer were evaluated. Results Pancreatic cancer was classified into 3 molecular subtypes, and survival analysis revealed that patients in Cluster3 (C3) had the worst prognosis, whereas Cluster1 (C1) had the best prognosis. The clinical phenotype and gene mutation were statistically different among the three molecular subtypes. Analysis of immunotherapy response revealed that most immune checkpoint genes were differentially expressed in the three subtypes. A lower risk of immune escape was observed in Cluster1 (C1), indicating higher sensitivity to immunotherapeutic drugs and subjects in this Cluster are more likely to benefit from immunotherapy. The pathways related to pancreatic cancer were differentially enriched among the three subtypes. Five genes, namely SFRP1, GIPR, EMP1, COL17A and CXCL11 were selected to construct a prognostic signature. Conclusions Single-cell seq data were to classify pancreatic cancer into three molecular subtypes based on differences in clinical phenotype, mutation, immune characteristics and differentially enriched pathways. Five prognosis-related genes were identified for prediction of survival of pancreatic cancer patients and to evaluate the efficacy of immunotherapy in various subtypes.


Introduction
Pancreatic cancer is associated with high mortality, ranking third in cancer-related deaths globally. The 5-year survival rate of pancreatic cancer is 3%. The incidence of pancreatic cancer is 1.6%, but the incidence increases due to smoking, obesity and other related factors [1]. Exocrine pancreatic ductal adenocarcinoma (PDAC) represents 90% of all the pancreatic cancer cases, whereas endocrine pancreatic carcinoma accounts for less than 5% of all the cases [2]. Diagnosis of pancreatic cancer at an early stage is challenging due to the lack of specific clinical symptoms and effective diagnostic methods. This cancer is characterized by high aggressiveness, but most cases are diagnosed in the late stage, and only 15% and 20% of the patients are eligible for surgery at diagnosis [3]. Surgery, chemotherapy, radiotherapy and palliative treatment are the conventional treatment strategies for the pancreatic cancer. Surgery is the curative treatment option for pancreatic cancer [4]. Neoadjuvant chemoradiotherapy effectively shrinks the tumor, thus some patients with advanced disease can undergo surgery and achieve longer expected survival [5]. Nevertheless, these subjects can develop resistance to chemotherapy due to the metastatic and aggressive nature of pancreatic cancer [6]. Therefore, this treatment option may not be effective in most patients. Immunotherapy has become an important treatment strategy in addition to surgery and chemotherapy [7], however, its efficacy against pancreatic cancer is limited compared with other tumors due to resistance induced by innate or adaptive immune effects [8,9]. As a result, various treatment plans and individualized therapy have been proposed to improve the outcomes of pancreatic cancer patients.
Cancers were previously classified according to their tissue of origin and clinical decisions on different treatment plans were made based on this classification. However, subsequent studies reported that patients with pathologically identical tumors had different prognoses. and Subtyping of cancers based on the similarities and differences in biologically relevant molecular similarities and differences can improve the conventional classification methods. Conventional classification methods can be optimized through accurate morphological and imaging analysis to improve the selection of systemic treatment regimens, individualized patient management, (primary surgery and neoadjuvant therapy), recruitment to clinical trials based on response prediction, and the evolution of treatment and research [10]. A recent study reported differences in gene expression within cells and altered interactions between cancer cells and various components of the tumor microenvironment (TME) [11]. Several cancer treatment options are being optimized or emerging due to advances in molecular typing. However, these methods have not been fully developed for pancreatic cancer compared with other cancers, such as breast and colon cancer. Moreover, the clinically relevant morphological or molecular subtypes of pancreatic cancer have not been fully elucidated [10]. A combination of immune gene expression and tumor cell expression analysis can provide essential information for developing methods for pancreatic cancer treatment [12].
Chronic pancreatitis is a risk factor for pancreatic cancer, and previous studies report a 7.2-fold higher risk for patients with a history of pancreatitis [13][14][15][16][17]. The sustained activation of the cGAS-STING pathway and its downstream effectors are associated with chronic inflammation and progression of cancer [18,19]. Therefore, the role of the cGAS-STING pathway in pancreatic cancer worth an in-depth study.
Single-cell sequencing is the amplification and sequencing of RNA or DNA extracted from a single cell. This technique can be used to accurately determine the genetic profile and expression status of a single cell, providing a basis for evaluating the heterogeneity of cells with the same phenotype [20]. In addition, this technique can provide new clues and insights into the mechanisms of tumorigenesis, metastasis and progression, the origin of tumors, differentiation of tumor stem cells and resistance to therapy. Therefore, the technique has high theoretical potential in evaluating the molecular subtypes of pancreatic cancer. TME is the cellular environment around tumors, including tumor cells, fibroblasts, mesenchymal cells, blood and lymphatic vessels, as well as various tumor-infiltrating immune cells and related chemokines and cytokines [21]. TME plays a critical role in tumorigenesis and progression [22,23]. Exploring the various biological processes that occur in TME and tumor immune microenvironment (TIME) is essential for exploring tumor evolution mechanism and developing new tumor immunotherapy options [24] .
In this study, we comprehensively evaluated the association between the molecular subtypes of pancreatic cancer and immune microenvironment. In addition, the correlation between expression of key genes and tumor immunity was investigated. The roles of critical genes in pancreatic cancer molecular subtypes were explored. These findings provide insight into underlying prognostic biomarkers related to immune infiltration of pancreatic cancer. Further, a comparison of the pathological features of the various molecular subtypes of pancreatic cancer was conducted and a risk model was constructed using the identified key genes.
TCGA data retrieval and pre-processing Data on clinical phenotypes of pancreatic cancer were retrieved from the TCGA database (https://portal.gdc.cancer. gov/), comprising 176 tumor samples.
CNV mutation data on the Masked Copy Number Segment type of pancreatic cancer were obtained from TCGA database. The CNV results were integrated using Gistic2 software.
Mutect2 software was used to calculate the SNV mutation information of the TCGA-PAAD gene.
GEO data retrieval and pre-processing Five sets of microarray data were retrieved from GEO database. The data comprised 42 tumor tissues from the GSE28735 dataset, 63 tumor tissues from the GSE57495 dataset, 66 tumor tissues from the GSE62452 dataset, 125 tumor tissues from the GSE71729 dataset, and 79 tumor tissues from the GSE85916 dataset.
The "limma" and "sva" packages in R were used to eliminate the batches of each sample and the "normalizeBetweenArrays" function were used for standardization of the five datasets.
cGAS-sting-related pathway retrieval The six pathways associated with cGAS-STING were obtained based previous literature [25].

Single-cell cluster dimension reduction analysis
The single cell data was screened and 10,232 cells were obtained. The percentage of rRNA and mitochondria was calculated using the "PercentageFeatureSet" function in R and 9,292 cells were obtained.
The "log-normalization" function in R was used to normalize the data of the three pancreatic cancer samples downloaded from the GEO database. The highly variable genes were identified using "FindVariableFeatures" function in R. Batch effects were eliminated using the CCA method in "FindIntegrationAnchors" and "IntegrateData" functions. A dim = 35 was chosen and the cells were Clustered using the "FindClusters" and "FindNeighbors" functions (with resolution = 0.1) by scaling all genes using the "ScaleDate" function and the anchors were identified through "PCA" dimensionality reduction.
Immune cells were extracted and re-Clustered by Resolution = 0.1, resulting in 10 subpopulations. UMAP (Uniform Manifold Approximation and Projection) dimensionality reduction analysis was performed on 9884 cells using the "RunTSNE" function, annotating with classical marker.
Marker genes of the six subgroups were screened using the "FindAllMarkers" function with a logFC of 0.5 (difference ploidy), a minpct (minimum percentage of differentially expressed genes) of 0.5 and an adjusted p < 0.05.
Further, the proportions of these six subgroups in each sample were determined BP enrichment analysis was performed using the "ClusterProfiler" package.

Dysregulation of immune cells TME
The "copykat" package was used to predict "CNV" changes in scRNA-seq immune cell data to distinguish tumor cells from normal cells in each sample. cGAS.STING related pathways were retrieved and the scores of malignant and nonmalignant cells were calculated using "ssGSEA" method in "GSVA" package. The "z-score" was standardized based on the enrichment score for each pathway.
Identification of key genes in Bulk RNA-seq ssGSEA was performed to determine the score of cGAS. STING-related pathways and the CD8 T score of patients in the TCGA dataset. The genes encoding the proteins were correlated with the cGAS.STING-related pathway score and CD8 T score by pearson correlation analysis, respectively.
Prognostic genes were identified from the key genes through univariate Cox analysis using the survival function in R (p < 0.01).
Identification of pancreatic cancer molecular subtypes based on CD8 T cells and genes associated with cGAS.STING pathways The "Pearson" and "PAM" algorithms were employed as the metrics distance and 500 bootstraps were performed to identify the molecular subtypes using the "ConsensusClusterPlus" R package. The training set comprised 80% of the patients. Tumor tissue in the TCGA dataset were used to classify patients based on consistent Clustering of 26 key gene expression profiles. The optimal number of Clusters was determined based on the cumulative distribution function (CDF).
The "limma" and "sva" packages were used to eliminate batch effects from the five GEO datasets. The "normalizeBetweenArrays" function was used to re standardize the data.

Comparison of the clinical phenotypes of molecular subtypes
The distribution of different clinical features in the TCGA dataset were compared among the three molecular subtypes to assess whether the clinical features differed among the subtypes (chi-square test).

Mutational characteristics of the molecular subtypes
The SNV mutation data were obtained from the TCGA dataset using mutect2. The first ten genes with the most significant mutations were selected from every subtype. The distributions of Fraction Altered, Homologous Recombination Defects, tumor mutation burden and Number of Segments were compared among the subtypes.

Immunologic characteristics of molecular subtypes and differences between immunotherapy and chemotherapy
The level of infiltration of immune cells in the TCGA cohort were evaluated to determine the differences in the immune microenvironment of patients among various molecular subtypes based on the levels of gene expression in immune cells. The scores of the 22 immune cells were evaluated using the "CIBERSORT" algorithm and "Kruskal.test" was conducted to identify differences between the three subtypes. "ESTIMATE" tool was utilized to evaluate the level of immune cell infiltration. The expression levels of immune checkpoint genes in the three subtypes were evaluated.
We then evaluated the difference in immunotherapy efficacy between the different subtypes. The clinical effectiveness of immunotherapy in the molecular subtypes was evaluated through using the "TIDE" (http://TIDE.dfci. harvard.edu/) tool.

Pathway analysis of molecular subtypes
The pathway scores for the various molecular subtypes of the samples were calculated using the "c2.cp.kegg.v7.5.1.symbols. gmt" function in the "GSVA" package.
GSEA was conducted using the GSEA software with "h. all. v7.5. symbols. gmt" as the background set.
The differences in 10 oncogenic pathways reported in previous studies [26] were evaluated in three molecular subtypes.
Construction of risk models on the basis of key phenotypic genes of CD8 T and cGAS.STING cells Differential analysis for Clust1 vs. no_Clust1, Clust2 vs. no_Clust2, and Clust3 vs. no_Clust3 was conducted using "Limma" package in R. FDR < 0.05 and |log2(Fold Change)| > 1 were used as the criteria.
A single-factor Cox regression analysis was conducted using 'survival' package to identify differentially expressed genes with p < 0.01.
We used lasso (Least absolute shrinkage and selection operator, Tibshirani (1996)) regression to further compress this prognosis-related genes in the TCGA dataset to reduce the number of genes in the risk model. Further we used stepwise multifactor regression analysis based on the genes from the lasso analysis results, starting with the most complex model using the stepAIC method in the "MASS" package and sequentially removing one variable to reduce the AIC (Akaike information criterion).
The TCGA dataset was utilized as the training dataset and risk scores were calculated separately for each sample based on the expression levels of prognosis-related genes. ROC analysis of the prognostic classification of the Riskscore was carried out with the "timeROC" R package. The prognostic classification efficiency was evaluated at 1, 3 and 5 years. At the same time, the zscore of Riskscore was conducted, and samples with Riskscore greater than zero were classified as high-risk group and samples with less than zero were classified as low-risk group after zscore, and KM survival curves were generated. To better validate the robustness of the model, the merged GSE dataset is validated using the same method.

Evaluation of the Riskscore using samples with different clinicopathological features
The relationship between clinical tumor features and Riskscore scores between the different clinical phenotypes of the TCGA dataset were evaluated to determine its clinical significance.
Integration of Riskscore with clinicopathological characteristics to improve prognostic models and survival prediction The risk scores were combined with clinical features and with univariate and multivariate Cox regression analyses conducted. A nomogram was constructed by combining the Riskscore with other clinicopathological characteristics for the risk assessment and survival prediction.
Validation of the expression levels of the selected genes through qRT-PCR Chondrocytes from normal human (C-7, Procell, Wuhan, Hubei, China) and pancreatic cancer patients (panc-1, patu8988, bxpc-3 Hoge Biotechnology Co., Ltd., Shanghai, China) were cultured in chondrocytes containing 10% fetal bovine serum (FBS, 10099, Thermo Fisher Scientific, Massachusetts, USA) in DMEM/F12 medium (SH30126.01, HyClone Technologies, Logan, USA). The relative expression of EMP1, GIPR, SFRP1, CXCL11, and COL17A mRNA was detected after 48 h of incubation. Primers were designed using DNAMAN software and synthesized by Shanghai Biotechnology Co., China. Cellular RNA was extracted using TRIzol (Invitrogen #15596-026). cDNA was synthesized using PrimeScript TM RT kit with gDNA Eraser (Takara #RR047A) and SYBR Green qPCR Mix (Beyotime #D7260). Amplification cycle was 40 cycles using 7500 Real-Time Polymerase Chain Reaction (RT-PCR) system. PCR data were treated with GAPDH as an internal reference and the relative expression in the samples was calculated using the ΔΔCT method.

Single-cell clustering and dimensionality reduction analysis
The results revealed that the amounts of mRNA and UMI were significantly correlated, but the amount of UMI/ mRNA was not significantly correlated with the expression level of mitochondrial genes (Suppl. Fig. S1A). A violin plot before and after quality control of the data is presented in Suppl. Figs. S1B and S1C.
The immune cells were extracted, re-Clustered, then further evaluated to obtain 10 distinct subpopulations. UMAP dimensionality reduction analysis was conducted for 9884 cells and the data were annotated by classical marker. Subpopulation 6 for CD8 T cells (expressing CD3D and GZMA). Subpopulations 1, 4 and 9 comprised B cells (expressing MS4A1, CD19 and CD79); subpopulation 7 comprised plasma cells (expressing CD79A and JSRP1); subpopulation 8 consisted of mast cells expressing (TPSAB1 and CPA3); subpopulation 0, 2 and 3 consisted of macrophages (expressing CD163, CD68 and CD14); subpopulation 6 consisted of DC (expressing CD1C and CD1E) (Suppl. Fig. S2). A UMAP plot showing the distribution of the three samples is presented in Fig. 1A. A UMAP plot of the different subpopulations after Clustering is shown in Fig. 1B. A UMAP plot of the distribution of cells after annotation is presented in Fig. 1C.
The first five marker genes that significantly contributed to expression in each subgroup were selected (Fig. 1D). The findings for the marker genes are presented in Table scRNA_marker_gene.txt.
The proportions of each sample for these 6 subpopulations were determined (Fig. 1E) and BP enrichment analyses were performed using the "ClusterProfiler" package ( Fig. 1F).
Immunologic dysregulation in single-cell TME In this study, 6,812 cancer cells and 10,536 normal cells were identified. A UMAP diagram was generated using the copykat package to distinguish between normal and tumor cells ( Fig. 2A). The proportion of non-malignant cells (diploid) and malignant cells (aneuploid) in each sample was evaluated (Fig. 2B).
The results showed that the correlation scores for cGAS. STING were lower in in CD8T cells for the malignant cells compared with the non-malignant cells (Fig. 2C).
Key genes were identified through bulk RNA-seq A total of 964 key genes associated with cGAS.STING and CD8 T cells were obtained from the previous analysis (Fig. 3A).
Univariate Cox analysis of the 964 genes revealed that 26 genes were significantly correlated with survival of prostate cancer patients (Fig. 3B). Pearson correlation analysis of the 26 prognosis-related genes indicated that they were significantly correlated with each other (Fig. 3C).

Identification of molecular subtypes based on CD8 T cells and cGAS.SING-related genes
The optimum number of Clusters was determined based on the cumulative distribution function (CDF). The CDF delta area curve showed that the Clustering results were more stable when the Cluster size was chosen as 3 (Figs. 4A and 4B), so k = 3 was used to obtain the three molecular subtypes (Fig. 4C). Further analysis of the prognostic features for the three molecular subtypes indicated distinct prognostic differences among the subtypes (Fig. 4D). Clust3 had the worst prognosis, followed by Clust2, whereas Clust1 had the best prognosis. Table tcga.subtype.txt shows the data for the TCGA dataset subtypes. Fig. 4E shows the prognostic relational KM curves of the three subtypes in the combined GEO cohort.
PCA was conducted after renormalization of data to view the distribution of GEO data before and after the batch was removed (Suppl. Fig. S3). The results showed that the data sets of each sample after the batch removal were Clustered together, indicating that there was no batch effect in the merged data.

Comparison of clinical phenotypes of the molecular subtypes
The distribution of the various clinical features in the three molecular subtypes from the TCGA dataset was compared to determine whether the clinical characteristics differed among the different subtypes (chi-square test). The results revealed significant differences in gender, tumor grade and survival status between the three subtypes in the TCGA dataset (Fig. 5).

Mutational characteristics of molecular subtypes
The "SNV" mutation data were obtained from TCGA database using "Mutect2". The top 10 genes with the most significant mutations in each subtype are presented in Fig. 6A. The results indicated that Fraction Altered, homologous recombination defects, tumor mutation burden and number of segments were significantly different between the various subtypes (Fig. 6B).
Immunologic characteristics of the molecular subtypes and differences between chemotherapy and immunotherapy The scores of 22 immune cells were determined with the "CIBERSORT" algorithm and a "Kruskal.test" was conducted to evaluate the differences in scores among the three subtypes (Fig. 7A). The immune cell infiltration levels were assessed using the ESTIMATE tool (Fig. 7B). The findings indicated that the "ImmuneScore" was highest for the Clust2 subtype, whereas that of Clust1 was lowest. The expression levels of immune checkpoint genes were evaluated in three subtypes and significant differences in expression levels were observed between the three subtypes (Fig. 7C).
Analysis of the differences in response to immunotherapy of the various subtypes revealed that the TIDE score for the Clust1 subtype was lower than in Clust2 and Clust3 in the TCGA cohort (Fig. 7D). In addition, the response level of various molecular subtypes to conventional chemotherapeutic agents in TCGA cohort was evaluated. The results indicated that Clust3 was more sensitive to the chemotherapy agents compared with Clust1 and Clust2 (Fig. 7E, Wilcoxon test).

Pathway analysis of molecular subtypes
The "kruskal.test" function was used to determine the significance of the pathway score among for the three subtypes, and key pathways were determined using p < 0.001 (Fig. 8A) Further, of the differences in 10 previously reported oncogenic pathways were evaluated among the three molecular subtypes. The findings showed significant differences in the evaluated pathways except he TP53 and PI3K pathways (Fig. 8E, Kruskal test).

Identification of key genes and construction of risk models for CD8 T cells and cGAS.STING phenotype
In this study, 15 up-regulated genes and 225 down-regulated genes were identified by comparison of Clust1 with the other Clusters. Further, 175 up-regulated genes, 11 downregulated genes were identified after comparison of Clust2 with the other two Clusters. A total of 21 up-regulated genes and 9 down-regulated genes were identified by the comparison between Clust3 and the other Clusters. The details differential expression of genes are shown in Table tcga.subtype.c1vsno_c1.txt,tcga.subtype.c2vsno_c2.txt   and tcga.subtype.c3vsno_c3.txt. In this study, 348 differentially expressed genes were identified as shown in Table com_gene.txt.
Univariate cox analysis was conducted for the 348 differentially expressed genes using the "survival" package and 60 prognosis-related genes were selected with p < 0.01 as the cut-off.
Lasso regression was conducted to determine the independent genes associated with the prognosis of patients from the 60 prognosis-related genes in TCGA dataset to construct a risk model. The trajectory of each independent variable was evaluated as shown in Fig. 9A. The results showed that the number of independent variable coefficients decreased with increase in lambda variable. A 10-fold cross validation was used for model validation and the confidence interval under each variable was determined, confidence intervals are shown in Fig. 9B. In this study, 13 genes were selected as the next target genes when lambda was 0.0452.
We analyzed the classification efficiency of prognostic prediction at 1, 3, 5 years, respectively, where the AUC reached 0.7 at 1, 3, 5 years, and also zscore the Riskscore, classify the samples with Riskscore greater than zero as high risk group and those with less than zero as low risk group after zscoreization, and plot KM curves, and found that they had highly significant differences p < 0.0001 (Figs. 9C and 9D).
To better validate the robustness of the model, we used the merged GSE dataset to validate using the same method, and similar results were obtained (Figs. 9E and 9F).

Comparison of Riskscore with different clinicopathologic features
Comparisons of differences in the clinicopathological characteristics between the Riskscore subgroups in the TCGA cohort showed similar results (Fig. 10A). The findings indicated that the risk score increase with increase in clinical grade (Fig. 10B).

Combination of
Riskscore with clinicopathological characteristics further improved the survival prediction of the constructed model Univariate and multivariate Cox regression analyses of the clinical characteristics and Riskscore demonstrated that Riskscore was the most significant prognostic factor (Figs. 11A and 11B). We combined the Riskscore with other clinicopathologic features and established a nomogram to quantify risk evaluation and probability of survival of patients (Fig. 11C). Riskscore had the greatest impact on survival prediction. Calibration curves were generated to assess the model's prediction accuracy (Fig. 11D). The findings revealed that the calibration and prediction curves for three calibration points at 1, 2 and 3 years almost overlapped with the standard curve, indicating that the nomogram had good predictive performance. Evaluation of the reliability of model with a decision curve analysis (DCA) revealed that the survival prediction accuracy of the nomogram and Riskscore was significantly higher compared with the clinicopathological characteristics (Figs. 11E  and 11F).

qRT-PCR
We performed qRT-PCR to verify the bioinformatics results. The experimental results showed that the mRNA expression levels of EMP1, GIPR, SFRP1, CXCL11 and COL17A were significantly high in the pancreatic cancer cell line compared with the controls. In this study, COL17A demonstrated the most significant expression difference between the samples. These findings indicate that the results obtained through bioinformatics analysis were reliable and have potential research value (Fig. 12).

Discussion
The treatment of pancreatic cancer is mainly based on the histopathologic difference between its subtypes, but the results have been unsatisfactory. Significant progress has been achieved in the molecular typing of cancer in recent years owing to the rapid advances in high-throughput sequencing and single-cell sequencing. Molecular typing is effective in elucidating the occurrence and progression of tumors and development of individualized treatment. In this study, we found that the correlation score of cGAS.STING was lower in malignant cells than in non-malignant cells in CD8T cells. As a result, 26 key genes were identified and pancreatic cancer was grouped into three molecular subtypes. KM survival curves of the three subtypes indicated that Clust1 subtype had the highest prognosis compared with the other subtypes.
The conventional treatment approaches mainly used for pancreatic cancer include surgery, chemotherapy and neoadjuvant chemotherapy. Most pancreatic cancer patients are not eligible for curative surgical treatment during diagnosis because most cases are diagnosed at advanced stages. Immunotherapy is effective in treating pancreatic cancer. Therefore, we analyzed the immune characteristics of the three molecular subtypes, and the results showed that Clust1 had the lowest ImmuneScore. TIDE scores were lower for the Clust1 subtype compared with lust2 and Clust3, indicating that the Clust1 subtype had a higher risk of immune escape and had higher potential response to immunotherapy. These findings provide potential avenues for the treatment of pancreatic cancer. Notably, Cluster3 (C3) was more sensitive to conventional chemotherapeutic drugs, indicating that different therapeutic regimens can be used for the different molecular types to improve treatment outcomes.
Differentially expressed genes between the three subtypes were evaluated and EMP1, GIPR, SFRP1, CXCL11 and COL17A1 were used to establish risk prediction and prognostic models. Analysis of the TCGA database showed that these five genes were significantly associated with the survival of patients. qRT-PCR showed a high expression levels of these five genes in pancreatic carcinoma.
EMP1 is a member of peripheral myelin protein 22-kDa (PMP22) gene family. Dysregulation of EMP1 expression is linked to epithelial diseases, particularly human cancers. EMP1 is mainly expressed in the early and immature neurons [27] and glioma [28], and in sclerotic gastric carcinoma cells [29]. EMP1 expression level can be used to to differentiate between invasive ductal and lobular breast carcinoma. This gene is associated with the ability of breast cancer to metastasize in vitro [30,31]. Microarray analysis showed that EMP1 was up-regulated by Her-2 in cDNA [32]. EMP1 has also is a bio-marker for HER-2 activation in breast cancer and is associated with lymph node metastasis in squamous cell carcinoma of the oral cavity [33]. EMP1 is correlated with clinical resistance to gefitinib in lung cancer [34]. Further, it is associated with in vitro resistance to prednisolone in leukemia cells [35]. High EMP1 expression level is associated with a low five-year event-free survival rate in precursor-B ALL patients [36].
Glucose-dependent insulin-stimulating peptide (GIP) is an insulin-stimulating hormone produced in the digestive tract and its release is induced by food intake [37]. It is a 42-amino acid peptide hormone previously associated with the suppression of gastric acid secretion [38]. It is mainly released in K cells of proximal small intestine (jejunum and duodenum) [39]. The main metabolic function of GIP is stimulating glucose-dependent pancreatic β cells to release insulin [40]. GIPR expression is dysregulated in several transformed endocrine tissues and may play a role in the etiopathogenesis of hormone secretion disorders [41].
SFRP1 is a member of the cysteine-rich structural domain of the SFRP family proteins. The cysteine-rich structural domain mimics the Wnt binding site of the coiled-coil receptor [42]. Proteins of the SFRP family modulate the Wnt signaling. SFRP1 inhibits WNTdependent transcription and reduces intracellular β-catenin levels [43]. Several studies report that low expression level of SFRP1 is correlated with a poor prognosis in cholangiocarcinoma, lung, breast as well as hepatocellular carcinoma subjects [43][44][45][46].
Chemokine (C-X-C motif) ligand 11 (CXCL11), also known as interferon-inducible protein 9 (IP-9) or interferon-induced T-cell alpha chemokine (I-TAC), is a complex cytokine with many functions in different tumors. It is mainly expressed in the thymus, pancreas, lung, liver, spleen and peripheral leukocytes, but low expression levels are observed in the intestine, placenta and prostate [47]. CXCL11 promotes recruitment of activated NK, CTL and Th1 cells in the tumor tissue in vivo [48,49]. CXCL11 increases the frequency of tumor-infiltrating lymphocytes and inhibits tumor growth in both breast cancer and T-cell lymphoma [49][50][51]. CXCL11 exhibits potent anti-tumor activity in vivo by promoting CD8+ T lymphocyte infiltration. CXCL11 is positively correlated with tumorinfiltrating CD8+ T cells in mice subjected to a transgenic administered with CXCL11-EL4 t-cell lymphoma cells. CD8 +T cell downregulation is associated with loss of CXCL11 antitumor effect in vivo. Upregulation of CXCL11 expression in EL4 cells transduced with CXCL11 promotes infiltration of total CD8+ and CD8+ CXCR3+ T lymphocyte and macrophages, with no impact on angiogenesis in EL4-CXCL11 tumors. This study demonstrated that local release of CXCL11 induces systemic tumor immunity [50,51].
Collagen XVII is an adhesive protein present in the basal epithelium. This protein is encoded by the COL17A1 gene and it modulates the growth and migration of cells [52]. COL17A1 is associated with a poor prognosis in some tumors [53,54]. Thangavelu et al. reported that hypomethylation and high expression of COL17A1 are correlated with poor prognosis in epithelial carcinoma [55]. Yan et al. observed that XVII collagen promoted aggressiveness and recurrence of gliomas [56]. The present findings showed a negative association between mRNA expression and DNA methylation of COL17A1 and a significant increase in CNV of COL17A1.  Furthermore, COL17A1 expression is modulated by CNV and DNA methylation. GSEA was conducted to explore the mechanism of action of COL17A1 in PC. The findings revealed that COL17A1 was correlated with the development of epithelial cells and the cell cycle. The transformation from monolayer to multilayer epithelial structures is a key marker of carcinogenesis [57], indicating that COL17A1 is an essential factor in the development and maintenance of multilayered epithelial structures [57]. These findings indicate that COL17A1 is implicated in the progression of PC by modulating the proliferation and differentiation of epithelial cells.
The KM survival curve showed significant differences between the molecular subtypes. The model's accuracy in clinical prognosis was evaluated by merging the base and generating KM survival curves. The prognostic model was combined with various clinicopathological characteristics to enhance its clinical significance. The findings revealed that that the constructed nomogram and Riskscore had the highest survival prediction ability.
The present study was based on the retrieval and reanalysis of samples from public databases, thus the findings are dependent on the quality, accuracy and completeness of the obtained samples. Furthermore, some genes were not available in the meta-cohort because analysis required comparison of different techniques for gene expressions quantification. However, the effect of deletion on subtype prediction was negligible as shown by the good molecular characteristics of the pancreatic cancer subtypes consistent with other previously reported subtypes. The clinical relevance of the three molecular subtypes is based on analysis of survival data. The treatment effect of these subtypes was not evaluated due to lack of treatment-related data.

Conclusion
In this study, we applied a bioinformatics approach based on CGAS.STING to classify pancreatic cancer into three molecular subtypes. We evaluated the differences in immunological characteristics, immunotherapy, chemotherapy and molecular pathways of the three molecular subtypes. Five genes were selected and used to build a prognostic risk model, which showed high clinical prognostic performance.
Acknowledgement: All authors contributed to the manuscript.
Funding Statement: The authors received no specific funding for this study.
Author Contributions: The authors confirm contribution to the paper as follows: study conception and design: Jianjun Tang, Xuefeng Wang. Sicong Jiang; data collection: Xuefeng Wang; analysis and interpretation of results: Sicong Jiang; draft manuscript preparation: Xinhong Zhou, Xiaofeng Wang, Lan Li. All authors reviewed the results and approved the final version of the manuscript.
Availability of Data and Materials: Data reported in this paper are available from the corresponding author upon reasonable request.
Ethics Approval: Not applicable. The manuscript does not involve any animal or human experiments.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.